Semi-Supervised Noun Compound Analysis with Edge and Span Features

نویسندگان

  • Yugo Murawaki
  • Sadao Kurohashi
چکیده

In this paper, we propose the use of spans in addition to edges in noun compound analysis. A span is a sequence of words that can represent a noun compound. Compared with edges, spans have good properties in terms of semi-supervised parsing. They can be reliably extracted from a huge amount of unannotated text. In addition, while the combinations of edges such as sibling and grandparent interactions are, in general, difficult to handle in parsing, it is quite easy to utilize spans with arbitrary width. We show that spans can be incorporated straightforwardly into the standard chart-based parsing algorithm. We create a semi-supervised discriminative parser that combines edge and span features. Experiments show that span features improve accuracy and that further gain is obtained when they are combined with edge features. TITLE AND ABSTRACT IN JAPANESE スパンとエッジ特徴量を用いた 半教師あり名詞句解析 名詞句解析において,エッジだけでなくスパンを手がかりとして使うことを提案する. スパンは名詞句を表しうる単語列であり,エッジと比べて半教師あり学習に適した性質 を持っている.すなわち,大量の生テキストから高い信頼性をもって抽出可能である. さらに,エッジは解析時に組み合わせ (兄弟や孫の関係など) を考えることが一般に難 しいのに対して,スパンは任意の長さの組み合わせを自明に利用できる.この論文で は,スパンが動的計画法による標準的な構文解析手法に簡単に組み込めることを示し, エッジとスパン特徴量を組み合わせた半教師ありの識別型構文解析器を提案する.実験 により,スパン特徴量が解析精度を改善し,エッジと組み合わせることでさらに精度が 向上することが示された.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Persian‌ Compound Nouns as Constructions

In Construction Morphology (CM), a compound is treated as a construction at the word level with a systematic correlation between its form and meaning, in the sense that any change in the form is accompanied by a change in the meaning. Compound words are coined by compounding templates which are called abstract schemas in CM. These abstract constructional schemas generalize over sets of existing...

متن کامل

Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender

English pronouns like he and they reliably reflect the gender and number of the entities to which they refer. Pronoun resolution systems can use this fact to filter noun candidates that do not agree with the pronoun gender. Indeed, broad-coverage models of noun gender have proved to be the most important source of world knowledge in automatic pronoun resolution systems. Previous approaches pred...

متن کامل

MEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection

Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...

متن کامل

Compound Embedding Features for Semi-supervised Learning

There has been a recent trend in discriminative methods of NLP to use representations of lexical items learned from unlabeled data as features, in order to overcome the problem of data sparsity. In this paper, we investigated the usage of word representations learned by neural language models, i.e. word embeddings. We built compound features of continuous word embeddings based on clustering to ...

متن کامل

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012